Smoothing a Pbsmt Model by Factoring out Adjuncts
نویسنده
چکیده
Phrase-Based Statistical Machine Translation (PBSMT) became a leading paradigm in Statistical Machine Translation after its introduction in 2003. From the start, one has tried to improve PBSMT by using linguistic knowledge, often by incorporating syntactic information into the model. This thesis proposes a simple approach to improve PBSMT using a general linguistic notion, that of adjuncts, or modifiers: One expects that in structurally similar languages like French and English, adjuncts in one language are likely to be translated as adjuncts in the other language. After verifying this assumption, this thesis describes how adjunct pairs are deleted from a bilingual corpus to generate new training data for a model, which is then used to smooth a PBSMT baseline. Experiments on a smoothed French-English model show only a marginal improvement over the baseline. It appears that few of the phrase pairs gained by adjunct-pair deletion are actually used in testing, so that improvement in performance mostly results from successful smoothing. Further research directions would be to find out in how far performance can be improved for this system, but also to apply adjunct-pair deletion to other language pairs and to hierarchical SMT models.
منابع مشابه
Learning Machine Translation from In-domain and Out-of-domain Data
The performance of Phrase-Based Statistical Machine Translation (PBSMT) systems mostly depends on training data. Many papers have investigated how to create new resources in order to increase the size of the training corpus in an attempt to improve PBSMT performance. In this work, we analyse and characterize the way in which the in-domain and outof-domain performance of PBSMT is impacted when t...
متن کاملFactoring Adjunction in Hierarchical Phrase-Based SMT
While much work has been done to inform Hierarchical Phrase-Based SMT (Chiang, 2005) models linguistically, the adjunct/argument distinction has generally not been exploited for these models. But as Shieber (2007) points out, capturing this distinction allows to abstract over ‘intervening’ adjuncts, and is thus relevant for (machine) translation in general. We contribute an adjunction-driven ap...
متن کاملHany Hassan , Khalil Sima ’ an and Andy Way ◦
Until quite recently, extending Phrase-based Statistical Machine Translation (PBSMT) with syntactic knowledge caused system performance to deteriorate. The most recent successful enrichments of PBSMT with hierarchical structure either employ non-linguistically motivated syntax for capturing hierarchical reordering phenomena, or extend the phrase translation table with redundantly ambiguous synt...
متن کاملTwo-step Smoothing Estimation of the Time-variant Parameter with Application to Temperature Data
‎In this article‎, ‎we develop two nonparametric smoothing estimators for parameter of a time-variant parametric model‎. ‎This parameter can be from any parametric family or from any parametric or semi-parametric regression model‎. ‎Estimation is based on a two-step procedure‎, ‎in which we first get the raw estimate of the parameter at a set of disjoint time...
متن کاملSupertagged Phrase-Based Statistical Machine Translation
Until quite recently, extending Phrase-based Statistical Machine Translation (PBSMT) with syntactic structure caused system performance to deteriorate. In this work we show that incorporating lexical syntactic descriptions in the form of supertags can yield significantly better PBSMT systems. We describe a novel PBSMT model that integrates supertags into the target language model and the target...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011